We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task. The agent can use its experience in each task and across tasks to estimate both the transition model and the distribution over tasks. We propose an algorithm to meta-learn the underlying structure across tasks, utilize it to plan in each task, and upper-bound the regret of the planning loss. Our bound suggests that the average regret over tasks decreases as the number of tasks increases and as the tasks are more similar. In the classical single-task setting, it is known that the planning horizon should depend on the estimated model's accuracy, that is, on the number of samples within task. We generalize this finding to meta-RL and study this dependence of planning horizons on the number of tasks. Based on our theoretical findings, we derive heuristics for selecting slowly increasing discount factors, and we validate its significance empirically.
translated by 谷歌翻译
我们研究了一个顺序决策问题,其中学习者面临$ k $武装的随机匪徒任务的顺序。对手可能会设计任务,但是对手受到限制,以在$ m $ and的较小(但未知)子集中选择每个任务的最佳组。任务边界可能是已知的(强盗元学习设置)或未知(非平稳的强盗设置)。我们设计了一种基于Burnit subsodular最大化的减少的算法,并表明,在大量任务和少数最佳武器的制度中,它在两种情况下的遗憾都比$ \ tilde {o}的简单基线要小。 \ sqrt {knt})$可以通过使用为非平稳匪徒问题设计的标准算法获得。对于固定任务长度$ \ tau $的强盗元学习问题,我们证明该算法的遗憾被限制为$ \ tilde {o}(nm \ sqrt {m \ tau}+n^{2/3} m \ tau)$。在每个任务中最佳武器的可识别性的其他假设下,我们显示了一个带有改进的$ \ tilde {o}(n \ sqrt {m \ tau}+n^{1/2} {1/2} \ sqrt的强盗元学习算法{m k \ tau})$遗憾。
translated by 谷歌翻译
Underwater images are altered by the physical characteristics of the medium through which light rays pass before reaching the optical sensor. Scattering and strong wavelength-dependent absorption significantly modify the captured colors depending on the distance of observed elements to the image plane. In this paper, we aim to recover the original colors of the scene as if the water had no effect on them. We propose two novel methods that rely on different sets of inputs. The first assumes that pixel intensities in the restored image are normally distributed within each color channel, leading to an alternative optimization of the well-known \textit{Sea-thru} method which acts on single images and their distance maps. We additionally introduce SUCRe, a new method that further exploits the scene's 3D Structure for Underwater Color Restoration. By following points in multiple images and tracking their intensities at different distances to the sensor we constrain the optimization of the image formation model parameters. When compared to similar existing approaches, SUCRe provides clear improvements in a variety of scenarios ranging from natural light to deep-sea environments. The code for both approaches is publicly available at https://github.com/clementinboittiaux/sucre .
translated by 谷歌翻译
We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$, where $n_x > k$ is the state dimension, $n_u$ is the input dimension, $N_{\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.
translated by 谷歌翻译
Forecasting the state of vegetation in response to climate and weather events is a major challenge. Its implementation will prove crucial in predicting crop yield, forest damage, or more generally the impact on ecosystems services relevant for socio-economic functioning, which if absent can lead to humanitarian disasters. Vegetation status depends on weather and environmental conditions that modulate complex ecological processes taking place at several timescales. Interactions between vegetation and different environmental drivers express responses at instantaneous but also time-lagged effects, often showing an emerging spatial context at landscape and regional scales. We formulate the land surface forecasting task as a strongly guided video prediction task where the objective is to forecast the vegetation developing at very fine resolution using topography and weather variables to guide the prediction. We use a Convolutional LSTM (ConvLSTM) architecture to address this task and predict changes in the vegetation state in Africa using Sentinel-2 satellite NDVI, having ERA5 weather reanalysis, SMAP satellite measurements, and topography (DEM of SRTMv4.1) as variables to guide the prediction. Ours results highlight how ConvLSTM models can not only forecast the seasonal evolution of NDVI at high resolution, but also the differential impacts of weather anomalies over the baselines. The model is able to predict different vegetation types, even those with very high NDVI variability during target length, which is promising to support anticipatory actions in the context of drought-related disasters.
translated by 谷歌翻译
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.
translated by 谷歌翻译
地震数据中的噪声来自许多来源,并且正在不断发展。使用监督的深度学习程序来降级地震数据集通常会导致性能差:这是由于缺乏无噪声的现场数据来充当训练目标以及合成数据集和现场数据集之间特性的巨大差异。自我监督,盲点网络通常通过直接在原始嘈杂的数据上训练来克服这些限制。但是,这样的网络通常依赖于随机噪声假设,并且在存在最小相关的噪声的情况下,它们的降解能力迅速降低。从盲点延伸到盲面可以有效地沿特定方向抑制连贯的噪声,但不能适应噪声的不断变化的特性。为了抢占网络预测信号并减少其学习噪声属性的机会的能力,我们在以自欺欺人的方式进行微调的方式,在节俭生成的合成数据集上对网络进行初始监督的培训。感兴趣的数据集。考虑到峰值信噪比的变化以及观察到的噪声量减少和信号泄漏的体积,我们说明了从监督的基础训练中的权重来初始化自我监督网络的明显好处。通过在字段数据集上进行的测试进一步支持,在该数据集中进行了微调网络在信号保存和降低噪声之间达到最佳平衡。最后,使用不切实际的,节俭生成的合成数据集用于监督的基础培训包括许多好处:需要最少的先验地质知识,大大降低了数据集生成的计算成本,并减少了重新训练的要求。网络应记录条件更改,仅举几例。
translated by 谷歌翻译
我们研究Claire(一种差异性多形状,多-GPU图像注册算法和软件)的性能 - 在具有数十亿素素的大规模生物医学成像应用中。在这样的分辨率下,大多数用于差异图像注册的软件包非常昂贵。结果,从业人员首先要大量删除原始图像,然后使用现有工具进行注册。我们的主要贡献是对降采样对注册性能的影响的广泛分析。我们通过将用Claire获得的全分辨率注册与合成和现实成像数据集的低分辨率注册进行比较,研究了这种影响。我们的结果表明,完全分辨率的注册可以产生卓越的注册质量 - 但并非总是如此。例如,将合成图像从$ 1024^3 $减少到$ 256^3 $将骰子系数从92%降低到79%。但是,对于嘈杂或低对比度的高分辨率图像,差异不太明显。克莱尔不仅允许我们在几秒钟内注册临床相关大小的图像,而且还可以在合理的时间内以前所未有的分辨率注册图像。考虑的最高分辨率是$ 2816 \ times3016 \ times1162 $的清晰图像。据我们所知,这是有关此类决议中图像注册质量的首次研究。
translated by 谷歌翻译
文档级信息提取(IE)任务最近开始使用端到端的神经网络技术对其句子级别的IE同行进行认真重新审视。但是,对方法的评估在许多维度上受到限制。特别是,Precision/Recell/F1分数通常报道,几乎没有关于模型造成的错误范围的见解。我们基于Kummerfeld和Klein(2013)的工作,为基于转换的框架提出了用于文档级事件和(N- ARY)关系提取的自动化错误分析的框架。我们采用我们的框架来比较来自三个域的数据集上的两种最先进的文档级模板填充方法;然后,为了衡量IE自30年前成立以来的进展,与MUC-4(1992)评估的四个系统相比。
translated by 谷歌翻译
基于学习的控制方案最近表现出了出色的效力执行复杂的任务。但是,为了将它们部署在实际系统中,保证该系统在在线培训和执行过程中将保持安全至关重要。因此,我们需要安全的在线学习框架,能够自主地理论当前的信息是否足以确保安全或需要新的测量。在本文中,我们提出了一个由两个部分组成的框架:首先,在需要时积极收集测量的隔离外检测机制,以确保至少一个安全备份方向始终可供使用;其次,基于高斯的基于过程的概率安全 - 关键控制器可确保系统始终保持安全的可能性。我们的方法通过使用控制屏障功能来利用模型知识,并以事件触发的方式从在线数据流中收集测量,以确保学习的安全至关重要控制器的递归可行性。反过来,这又使我们能够提供具有很高概率的安全集的正式结果,即使在先验未开发的区域中也是如此。最后,我们在自适应巡航控制系统的数值模拟中验证了所提出的框架。
translated by 谷歌翻译